IRRA at TREC 2012: Divergence From Independence (DFI)

نویسنده

  • Bekir Taner Dinçer
چکیده

IRRA (IR-Ra) group participated in the 2012 Web track, with a system implementing a non-parametric term weighting method based on measuring the divergence from independence (DFI). This is the third year of participation for IRRA group, following the participations in TREC 2009 and 2010 Web tracks. In this year, the aim is to evaluate a new DFI-based term weighting model developed on the basis of Shannon’s information theory (Shannon, 1949), along with the evaluation of a heuristic approach that is expected to provide early precision when used together with DFI term weighting. The TERRIER retrieval platform version 3.0 (Ounis et al., 2007) is used to index and search the ClueWeb09-T09B data set (“Category B” data set), a subset of about 50 million Web pages in English. During indexing and searching, terms are stemmed (Porter’s stemmer as implemented in TERRIER) but not stopped. The result sets are filtered using the fusion of two spam-page lists provided by Cormack et al. (2010) for ClueWeb09 document collection.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

IRRA at TREC 2009: Index Term Weighting Based on Divergence From Independence Model

IRRA (IR-Ra) group participated in the 2009 Web track (both adhoc task and diversity task) and the Million Query track. In this year, the major concern is to examine the effectiveness of a novel, nonparametric index term weighting model, divergence from independence (DFI). The notion of independence, which is the notion behind the well-known statistical exploratory data analysis technique calle...

متن کامل

IRRA at TREC 2010: Index Term Weighting by Divergence From Independence Model

IRRA (IR-Ra) group participated in the 2010 Web track. In this year, the major concern is to examine the effect of supplementary methods on the effectiveness of the new nonparametric index term weighting model, divergence from independence (DFI). Every written text document contains words, but the words used in individual documents may differ due to many divergent (latent) factors, such as topi...

متن کامل

Best and Fairest: An Empirical Analysis of Retrieval System Bias

In this paper, we explore the bias of term weighting schemes used by retrieval models. Here, we consider bias as the extent to which a retrieval model unduly favours certain documents over others because of characteristics within and about the document. We set out to find the least biased retrieval model/weighting. This is largely motivated by the recent proposal of a new suite of retrieval mod...

متن کامل

Fondazione Ugo Bordoni at TREC 2004

Our participation in TREC 2004 aims to extend and improve the use of the DFR (Divergence From Randomness) models with Query Expansion (QE) for the robust track. We experiment with a new parameter-free version of Rocchio’s Query Expansion and use the information theory based function, InfoDFR to predict the AP (Average Precision) of queries. We also study how the use of an external collection af...

متن کامل

Graph-Based Text Representation For Novelty Detection

We discuss several feature sets for novelty detection at the sentence level, using the data and procedure established in task 2 of the TREC 2004 novelty track. In particular, we investigate feature sets derived from graph representations of sentences and sets of sentences. We show that a highly connected graph produced by using sentence-level term distances and pointwise mutual information can ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2012